Using query logs to establish vocabularies in distributed information retrieval
نویسندگان
چکیده
Users of search engines express their needs as queries, typically consisting of a small number of terms. The resulting search engine query logs are valuable resources that can be used to predict how people interact with the search system. In this paper, we introduce two novel applications of query logs, in the context of distributed information retrieval. First, we use query log terms to guide sampling from uncooperative distributed collections. We show that while our sampling strategy is at least as efficient as current methods, it consistently performs better. Second, we propose and evaluate a pruning strategy that uses query log information to eliminate terms. Our experiments show that our proposed pruning method maintains the accuracy achieved by complete indexes, while decreasing the index size by up to 60%. While such pruning may not always be desirable in practice, it provides a useful benchmark against which other pruning strategies can be measured. 2006 Elsevier Ltd. All rights reserved.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملQEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches
A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...
متن کاملPrototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملAnalyzing SQL Query Logs using Multi-Relational Graphs
Computer Science 6 (Data Management), FAU Erlangen-Nürnberg {andreas.wahl|richard.lenz}@fau.de Analytical SQL queries are a valuable source of information. They contain expert knowledge that cannot be inferred from schemas or content alone. Consider, for example, data lake scenarios, where relational and semi-structured data sources are combined in a single storage and processing environment. D...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 43 شماره
صفحات -
تاریخ انتشار 2007